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Abstract: Due to the restriction of designing faster and 
faster computers, one has to find the ways to maximize 
the performance of the available hardware. A 
distributed system consists of several autonomous 
nodes, where some nodes are busy with processing, 
while some nodes are idle without any processing. To 
make better utilization of the hardware, the tasks or 
load of the overloaded node will be sent to the under 
loaded node that has less processing weight to 
minimize the response time of the tasks. Load 
balancing is a tool used effectively for balancing the 
load among the systems. Dynamic load balancing 
takes into account of the current system state for 
migration of the tasks from heavily loaded nodes to the 
lightly loaded nodes. In this paper, we devised an 
adaptive load-sharing algorithm to balance the load 
by taking into consideration of connectivity among the 
nodes, processing capacity of each node and link 
capacity. 
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I. INTRODUCTION 

An important attribute in a dynamic load balancing 
policy is to initiate the load balancing activity. The 
balancing activity specifies which node is responsible 
for detecting imbalance of the load among the 
nodes [9]. A load-balancing algorithm is invoked when 
load imbalance among the nodes is detected. The 
initiation of load balancing activity will have a higher 
impact on complexity, overhead and scalability. The 
load balancing algorithm is designed in such a way to 
make the overloaded node to transfer its excess load to 
the underloaded node which is called sender - initiated 
and when underloaded node requests the load from the 
overloaded node then it is called receiver- initiated [6] 
[8]. 

Domain balancing is used to decentralize the 
balancing process by minimizing its scope and 
decrease the complexity of the load balancing 
algorithm. A domain is defined as subset of nodes in a 
system, such that a load balancing algorithm can be 
applied for this subset of nodes in a single step. 
Domain balancing is used in load balancing algorithms 



to decentralize the balancing. The balancing domains 
are further divided into two types: The first type is 
overlapped domains, which consists of node initiating 
the balancing activity and balances its load by 
migrating the tasks or load units with the set of 
surrounding nodes [3]. 

Global balancing is achieved by balancing every 
domain and by diffusing the excess load throughout 
the overlapped domains in a distributed system. 
Another important attribute in load balancing 
algorithm is the degree of information. The degree of 
information plays an important role in making the load 
balancing decisions. To achieve global load balancing 
in a few steps, the load balancing should get absolute 
information instead of getting the obsolete information 
from the nodes. In general, the collection of 
information by a node is restricted to the domain or 
nearest neighboring nodes (which are directly 
connected to a node) [4]. 

Although collecting information from all the nodes 
in a distributed system gives the exact knowledge of 
the system, it introduces large communication delay, 
so from this perspective, it will have a negative impact 
on the load balancing algorithm. In such cases, it has 
been observed, that the average response time is kept 
minimum without load balancing instead of doing the 
load balancing which induces overhead in migrating 
the load from one node to another node in the system 
[5]. 

In this section, an abstract view of the software 
details is presented for load balancing. The distributed 
system consists of several nodes and the same load 
balancing software is installed to run on all the nodes 
in the distributed system. By installing the same 
software in all the nodes, the load balancing decision is 
taken by a node locally (decentralized) by collecting 
the information from the neighboring nodes as 
opposed to the centralized load balancing policy[14]. 

The program must use a multi-threaded concept to 
implement load balancing in a distributed system. Two 
communication ports are available: TCP and UDP. 
UDP is preferable as it incurs less communication 
overhead. In general the architecture provides three 
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layers: Communication layer, Load balancing process 
and application layer[14][10]. For storing information 
two data structures were used. 

The communication link is responsible for four 
phases: node status information phase, node status 
reception, tasks reception and task migration. The node 
status information is responsible for disseminating the 
load information to the node that has requested it. The 
exchange of the information has a profound effect on 
the load balancing decision; it has to be done 
according to the predefined intervals of time specified 
on each node [7] [14]. 

The status reception is responsible for receiving the 
status information from the other nodes and it will be 
updated in the local node list which is running the 
status reception phase. Here it is possible to distinguish 
the old information from the new information. The 
technique that is used to find is to associate the 
timestamp for the information that it has received from 
some node (say TSHlnf), the time stamp attached to 
the information received from j to i). The local node 
say i maintaining the status about the node j is kept in 
the memory. If any estimate regarding node j exists in 
the node i memory, it will be compared to the received 
time stamp message and drops the old time stamp and 
the new timestamp message has been saved in the 
memory as the old time stamp has the obsolete 
information[ll][l][2]. 

Once a node collects the above information, then it 
knows whether it is overloaded or underloaded. In case 
if it is overloaded node, then it transmits the excess 
tasks (loads) to the underloaded nodes in a "tasks 
transmission" phase. The next initiation of load 
balancing activity will be done only when the current 
migration of load units to the underloaded nodes is 
completed. 

The "task reception" is responsible for listening to 
the requests and accepts the tasks sent from the other 
nodes. As we can observe from the above situations, 
the minimum time to initiate the new load balancing 
activity takes three time instants. One instant for 
receiving the status of all the nodes and second time 
instant for determining the underloaded nodes and 
computing the excess load and third time instant for 
transferring the excess load to the underloaded nodes 
which has been determined in the second time instant. 
So, the new load balancing activity takes place only at 
the fourth time instant [12] [14]. 

In a few papers [3] [9] [10], it is assumed that the 
nodes will not fail. The problem arises when the nodes 
fail which is common in the distributed systems. 
Sometimes a communication link will also fail, so the 
node will be unreachable. These two aspects i.e., 
failure of a node and the communication link will 
affect greatly the load balancing algorithms. Let us 



assume the following scenario. The overloaded node 
has collected the load information from the 
neighboring nodes and found some of the nodes are 
low loaded as discussed earlier. Now at the given time 
instant when the node tries to send its excess load to 
the overloaded node, it will not succeed because of the 
failure of the node. The node may fail after sending the 
status information. If this happens, an alternative must 
be chosen to avoid a failure of the load43alancing 
algorithm. 

II. NOTATIONS & ASSUMPTIONS 

N : Number of nodes 

V- {1, 2,..., Nj a set of nodes in a system 

Xt(t) : Expected waiting time experienced by a task 
inserted into the queue at the i' h node in time t 

Ai(t) : rate of generation of waiting time on /"' node 
caused by the addition of tasks in time t. 

Si(t) : rate of reduction in waiting time caused by the 
service of the tasks at the i l node in time t. 

ri(t) : rate of removal(transfer) of the tasks from node j 
to node i at time t by the load balancing algorithm at 
node j . 

tSi : Average completion time of the task at node i. 

bj : Average size of the task in bytes at node i when it 
is transferred 

dij : Transfer rate in bytes/sec between node i and node 

j 

X;(£): Average size of the queue calculated by node i 

based on its domain information at time t. 

Di : Neighboring nodes to i which is defined as 
D t = {j\j £ V and (i,j) £ E] where V= {1, 2...NJ 

E t (t) : Excess number of tasks at node i at time t. 

fy : Portion of the excess tasks of node i to be 
transferred to node j decided by the load balancing 
algorithm. 

The following assumptions were made in this paper: 

1 . It is assumed that a distributed system consists of N 
heterogeneous nodes interconnected by an 
underlying arbitrary communication network. Each 
node i in a system has a processing weight P ; >0 
and processing capacity 5,>0. The load is defined to 
be L,= P/Sj. In homogenous case the value of 
L t =P, 

2. It has been assumed that tasks arrive at node i 
according to Poisson process with rate Aj(t). A task 
arrived at node i may be processed locally or 
migrated through the network to another node j for 
remote processing. Once the task is migrated, it 
remains there until its completion. 
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3. It is assumed that there is a communication delay 
incurred when task is transferred from one node to 
another before the task can be processed in the 
system. The communication delays are different for 
each link. 

Each node contains an independent queue where 
arrived tasks are added to the queue, which results in 
accumulation of waiting time. Load balancing must be 
done repeatedly to maintain load balance in the 
system. The proposed algorithm is distributed in nature 
meaning that each node runs load-balancing algorithm 
autonomously. 

The second level of the system is a load-balancing 
layer, which consists of load balancing algorithms. The 
load balancing process is initiated by suing predefined 
time instants or randomly generating which is kept in a 
file. The algorithm determines the portion of the 
excess load to be sent to the underloaded node based 
on the current state of the node and availability of the 
nodes in the network. The load balancing algorithm 
must consider the communication delay while 
migrating the tasks to the other nodes. The algorithm 
selects the tasks to migrate to other nodes by setting 
their status as inactive to avoid execution of the tasks 
by current node application during the transition 
period. After completion of the task transmission 
activity, the status of the tasks is set to active when 
they are not transmitted to any node, when the tasks 
are transmitted to other nodes during the task 
transmission phase then those tasks are removed from 
the task queue of the current node. 

Application layer consists of two threads of control: 
Task input and task execution threads. The task input 
creates a number of tasks defined in the initialization 
file and inserts them in the task queue. This task input 
is also responsible for adding the new tasks to the task 
queue either from the current node or from other nodes 
in the system. The task execution thread is responsible 
for execution of the tasks and updating the QSize 
variable by removing the task from the task queue. 

The load balancing policy must take into account of 
processing capacity of the node while migrating the 
tasks to it. The selected node may become a candidate 
for one or more overloaded node in a given time 
instant because of the decentralized policy. Another 
issue to be considered is variable task completion 
times. Taking these issues in priori is not possible so a 
load balancing strategy must be adaptive to the 
dynamic state changes in the system and act 
accordingly to transfer the tasks. Even this can result 
in task shuttle between the nodes, so a migration limit 
for a task should be set to avoid task thrashing. 

Another issue to be considered while migrating the 
tasks from one node to another node in a system is 
communication overhead. Large communication 
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delays will have a negative impact on the load 
balancing policy, so, the transfer delays must be taken 
into account while migrating the task. When the 
completion of the task time in current node is greater 
than the completion time on task in another node 
inclusive of communication overhead, then only a task 
is considered for migration. 

III. MATHEMATICAL MODEL 

The mathematical model for load balancing in a 
given node i is given by [1][2] 

^=A t -S i+ r t {t) - ijUlfljfytt - Ty) (1) 

Ei(t)=qi(t)-q t (t) 
n (t) = G t (E t (t» 

/y>0,/ H =0,5:j2/y = l 

E if y > 
fy<0 

When a task is inserted into the task queue of node 
i, then it experiences the expected waiting time which 
is denoted by W;(t). 

Let the number of tasks in i th node is denoted by q;(t). 

Let the average time needed to service the task at node 
its t . 

The expected (average) waiting time is given by at 
node i is given by Wi(t) = qi(t)tSi. 

Note that W((£)/tS( = q t is the number of tasks in 
the node i queue. 

Similarly w k (i)/ts k = q k is the queue length of 
some node k. If tasks on node i were transferred to 
some node k, then the waiting time transferred is 

qits k =— , so that the fraction ts k /tSi converts 

waiting time on node i to waiting time on node k. 

A i : Waiting time generated by adding the task in the 

i th node. 

Si : Rate of reduction in waiting time caused by the 

service of tasks at the i th node is given by S t = (1 * 

tpi)/tpi=\ for alliVj(t) > 0. 

r;(£) : The rate of removal (transfer) of the tasks 
from node i at time t by the load balancing algorithm at 
node i. /y is the fraction of i th node tasks to be sent out 
to the j th node. In more detail f;jr ; (t) is the rate at which 
node i sends waiting time (tasks) to node i at time t 
where fi;>=0 and /;;=0.That is, the transfer from node i 

of expected waiting time (tasks) J 2 E t (t)dt in the 

interval of time [t 1 , t 2 ] to the other nodes is carried out 
with the j th node receiving the fraction Pij(t p ./ 

t pi ) J 2 Ui(t)dt where the ratio t p ./t p . converts the 
task from waiting time on node i to waiting time on 
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node j. As Y?=t(fij £* E t (t)dt ) = j* 2 E t (t)dt , this 
results in removing all of the waiting time / 2 E t (t)dt 

from node i.The quantity fijEi(t — Ty) is the rate of 
increase (rate of transfer) of the expected waiting time 
(tasks) at time t from node i by (to) node j where 
T ij(jii = 0) is the time delay for the task transfer from 
node i to node j. 

In this model, all rates are in units of the rate of 
change of expected waiting time, or time/time which is 
dimensionless. As Ei(t) > 0, node i can only send 
tasks to other nodes and cannot initiate transfers from 
another node to itself. A delay is experienced by 
transmitted tasks before they are received at the other 
node. The control law £";(£) = G; * Ei(t) states that if 
the i th node output Wi(t) is above the domain average 
(£7=1 £/;(£ — T;y))/n, then it sends data to the other 
nodes, while if it is less than the domain average 
nothing is sent. The j th node receives the fraction 
J t 2 Fij (t p ./t p .)Ui(t)dt of transferred waiting time 

f 2 Ei(t)dt delayed by the time Ty. The model 

described in (1) is the basic model for load balancing, 
but an important feature is to determine fy for each 
underloaded node j . One approach is to distribute the 
excess load equally to all the underloaded neighbors. 

fa forigtj. 

Jij n _i j 

Another approach is to use the load information 
collected from the neighbors to determine the deficit 
load of the neighbors. The deficit load of the neighbors 
shall be determined by node i by using the formula (2) 

qj(t-Tij)-q t (2) 

The above formula is used by node i to compute the 
deficiency waiting times in the queue of node j with 
respect to the domain load average of node i. 

If node j queue is above the domain average 
waiting time, then node i do not send tasks to it. 
Therefore (q t — qj{t-Tij)) is a measure by node i as 
how much node j is behind the domain average waiting 
time. Node i performs this computation for all the 
other nodes which are directly connected to it and then 
portions out its tasks among the other nodes that fall 
below the domain queue average of node i. 

f _ (Qi-qjit-Tij)) 

If the denominator £/=i(?j — qj(t — Tj ; )=0 then fy 
are defined to be zero then no waiting times are 
transferred. If the denominator X/ii(<ft — <7y(t — 
Ty)=0, then(q £ - q } (t - Ty) < 0V; 6 N t . 



However by definition of the average S/Iife — 

qj(t - Tij)+q t - qi(t) =1%^ - qj(t - Ty))=0 
which implies 

qi - qy(t)=E$!ii($ _ <*;(* _ %)) > ° 

That is, if the denominator is zero, the node j is not 
greater than its domain queue average, so Ej(t)= Gj 
.Ei(t))=0, where G is Gain Factor.fy : Portion of the 
excess tasks of node i to be transferred to node j 
decided by the load balancing algorithm. 

Except the last three parameters remaining 
information is known at the time of load balancing 
process. Before the instance of load balancing activity, 
every variable is updated. 

IV. ALGORITHM 

A. Algorithm ALS 

The current node i, performs the folio wings: 

a. Calculate the average queue size (q.)based on the 
information received from the neighbouring nodes. 

if (q t > q £ )thenEi=(q r q £ ) * G 
else Exit. 

b. Determine the participant nodes in load sharing 
process. 

Participants^ {jl qj<q~i, VjeNj} 

c. Calculate the fraction of the load ( /y ) to be sent to 
the participants 

ts ; 

d. Calculate maximum portion of the excess load 
ifij") 



fli = 



(gi-Ej) tSj dij 
Ejbj 



e. / ij = Min (/ \ Jt f tJ ) 

f. For j£ Participants 

i. Announce to node j about its willingness to 
send Tjj= fij *E ; tasks; 

ii. nowReceived = call procedure acceptanceFrom 

Nodej() 
iii. if(nowReceived >0) 

1 . Transfer NowReceived to j 

2. Tjj=Tij- NowReceived 
End if 

g. Repeat steps from (a) to (f). 

Procedure acceptanceFromNodejO 
if ((q,-+ T;j)<q ; nowSend=q ; - — qf, 
else nowSend=-l; 
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return now Send; 
end acceptanceFromNodej 

In general it is assumed that keeping the Gain factor 
G=l will give the good performance. But in a 
distributed system with largest delays and the nodes 
that have domain queue average outdated gives poor 
result. This phenomenon was first observed by the load 
balancing group at the University of New Mexico [7]. 
So the G values are set in the way that yields an 
optimal result. Another step that is added in the above 
algorithm is to test the node availability. It checks both 
node availability as well as the amount of waiting 
times it can receive. The node executing the ALS is 
permitted to send the tasks to the neighbors after 
receiving the acknowledgement specifying the amount 
of the load they can be able to process.. The time 
complexity of the proposed algorithm is 0(d) shown in 
the table 1. 

Table 1: ALS Operations 



Sno 


Actions 


Operation 


Quantity, 

(d is the 

number of 

neighbors) 


1 


Compute 
average 
queue size 


Addition 

Division 

Multiplication 


d+1 
d 
d 


2 


Compute Ei 


Subtraction 
Multiplication 


1 
1 


3 


Determine 
the 

participant 
nodes 


Comparison 


d 


4 


Compute 
ft/ 


Subtraction 

Division 

Multiplication 


d+1 
d+1 
d+1 


5 


Compute 
f ■■" 


Subtraction 

Division 

Multiplication 


1 
1 

3 


6 


Compute T;j 


Multiplication 


d 


7 


Message to 
node 


Transfer 


d 


8 


Compute 
nowReceived 


Addition 

Comparison 

Message 

Transfer 


d 
d 
d 



V. SIMULATION 

To test the performance of the newly proposed 
load-balancing policy, a Java program is developed to 
test the performance of the existing and proposed 
algorithms. The existing algorithms, ELISA and 
DOLB are used to compare with the proposed 
algorithm ALS. The DOLB is very much related to the 
above problem. The initial settings and parameters are 
shown in Table 2. The average network transfer rates 
between each node are represented by the cost 
adjacency matrix. 

The proposed algorithm ALS is tested with DOLB 
& ELISA for the gain values G between 0. 3 and 1 with 
0.1 incremental steps. The a parameter introduced in 
the previous section was set to 0.05 by running several 
experiments and observing the behavior of the tsi 
parameter. Note that, the first time the load-balancing 
process was triggered was after 40s from the start of 
the system and then the strategy was executed 
regularly at 20s interval. 

Table 2: Simulation Parameters 



Number of nodes 


16,32,64 


Initial task 
distribution 


[100.. 1000] tasks distributed 
randomly at each node 


Average task 
processing time(£s 
in ms) 


Processing time is randomly 
distributed in a 
range[300...800] 


Size of task( in Mb) 


1 


Load balancing 
instance 


First time the load balancing 
was triggered at 5 s then for 
every 10s the load balancing 
is initiated 


Bandwidth 
distribution (d^) 


A cost adjacency matrix 
denotes the transfer rate 
between the nodes. 


Number of nodes 


16,32,64 


Initial task 
distribution 


[100.. 1000] tasks distributed 
randomly at each node 


Average task 
processing time(£s 
in ms) 


Processing time is randomly 
distributed in a 
range[300...800] 



This was done to ensure that the ts parameter had 
enough time to adapt and reflect the current 
computational power of each node before the 

occurrence of any tasks migration between the nodes. 

ts- 
Note that the ratio — l - are fixed over time. The 



proposed and rival methods were evaluated by 
conducting 10 runs for each value of G between 0.3 
and 1 with 0. 1 incremental step. 
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Figure 1: Completion time averaged over 5 runs vs different 

gain values K. The graphs shows the results of three policies 

for system size=64. 
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Figure 2: Completion time averaged over 5 runs vs. 

different gain values K. The graphs shows the results of 

three policies for system size=32. 
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Figure 3: Total number of tasks exchanged averaged over 5 
runs Vs different Gain values K. The graphs shows the 
performance of the three policies for system size=16. 

VI. CONCLUSION 

The proposed algorithm is better when compared to 
the existing algorithms in the literature. In simulation, 
we assumed the tasks with no precedence and with no 
deadlines. However, as a future work, the algorithm 



must focus on considering the tasks with dead line and 
tasks with precedence relations. 
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